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Chapter 15 

South and Central Asia-IV 

Other Historic Scripts 


This chapter documents other historic scripts of South and Central Asia. The following 


scripts are described in this chapter: 

Syloti Nagri 

Khojki 

Nandinagari 

Kaithi 

Khudawadi 

Grantha 

Sharada 

Multani 

Ahom 

Takri 

Tirhuta 

Sora Sompeng 

Siddham 

Modi 

Dogra 

Mahajani 




Most of these scripts are historically related to the other scripts of India, and most are ulti¬ 
mately derived from the Brahmi script. None of them were standardized in ISCII. The 
encoding for each script is done on its own terms, and the blocks do not make use of a 
common pattern for the layout of code points. 

This introduction briefly identifies each script, occasionally highlighting the most salient 
distinctive attributes of the script. Details are provided in the individual block descriptions 
that follow. 

Syloti Nagri is used to write the modern Sylheti language of northeast Bangladesh and 
southeast Assam in India. 

Kaithi is a historic North Indian script, closely related to the Devanagari and Gujarati 
scripts. It was used in the area of the present-day states of Bihar and Uttar Pradesh in 
northern India, from the 16th century until the early 20th century. 

Sharada is a historical script that was used to write Sanskrit, Kashmiri, and other languages 
of northern South Asia; it was the principal inscriptional and literary script of Kashmir 
from the 8th century ce until the 20th century. It has limited and specialized modern use. 

Takri, descended from Sharada, is used in northern India and surrounding countries. It is 
the traditional writing system for the Chambeali and Dogri languages, as well as several 
“Pahari” languages. In addition to popular usage for commercial and informal purposes, 
Takri served as the official script of several princely states of northern and northwestern 
India from the 17th century until the middle of the 20th century. 

Siddham is another Brahmi-based writing system related to Sharada, and structurally sim¬ 
ilar to Devanagari. It originated in India, and was used across South, Central, and East 
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Asia, and is presently predominantly used in East Asia. Originally used for writing Bud¬ 
dhist manuscripts, the script is still used by Japanese Buddhist communities. 

Mahajani is a Brahmi-based alphabet commonly used by bankers and money lenders 
across northern India until the middle of the 20th century. It is a specialized commercial 
script used for writing accounts and financial records. Mahajani has similarities to Landa, 
Kaithi, and Devanagari. 

Khojki is a writing system used by the Nizari Ismaili community of South Asia for record¬ 
ing religious literature. It is one of two Landa scripts—the other being Gurmuhki—that 
were developed into formal liturgical scripts for use by religious communities. It is still 
used today. 

Khudawadi is a Landa-based script that was used to write the Sindhi language spoken in 
India and Pakistan. It is related to Sharada. Known as the shopkeeper and merchant script, 
it was used for routine writing, accounting, and other commercial purposes. 

The Multani script was used write the Seraiki language of eastern and southeastern Paki¬ 
stan during the 19th and 20th centuries. Multani is related to Gurmukhi and more dis¬ 
tantly related to Khudawadi and Khojki. It was used for routine writing and commercial 
activities. 

Tirhuta, another Brahmi-based script, is related to the Bengali, Newari, and Oriya scripts. 
Tirhuta is the traditional writing system for the Maithili language, which is spoken by more 
than 35 million people in parts of India and Nepal. Maithili is an official regional language 
of India and the second most spoken language in Nepal. 

Modi is another Brahmi-based script mainly used to write Marathi, a language spoken in 
western and central India. It emerged in the 16th century and derives from the Nagari 
scripts. It is still used some today. 

Nandinagari is a Brahmi-based abugida that was used in southern India between the 11th 
and 19th centuries for manuscripts and inscriptions in Sanskrit. It is related to Devanagari. 
The script was also used for writing Kannada in Karnataka. 

Grantha, a script with a long history, is used to write the Sanskrit language in parts of South 
India, Sri Lanka and elsewhere. It is in daily use by Vedic scholars and Hindu temple priests. 

Ahom is a script of northeast India that dates to about the 16th century and was used pri¬ 
marily to write the Tai Ahom language. The script has seen a revival in the 20th century, 
and continues in some use today. 

Sora Sompeng is used to write the Sora language spoken by the Sora people, who live in 
eastern India between the Oriya- and Telugu-speaking populations. The script was created 
in 1936 and is used in religious contexts. 

During the 17th century, the Brahmi-based Dogra script was used to write the Dogri lan¬ 
guage in Jammu and Kashmir in the northern region of the Indian subcontinent. The 
Dogra script was standardized in the 1860s, and is closely related to the Takri script. Dogri 
is now usually written with the Devanagari script. 
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15.1 SylotiNagri 

SylotiNagri: U+A800-U+A82F 

Syloti Nagri is a lesser-known Brahmi-derived script used for writing the Sylheti language. 
Sylheti is an Indo-European language spoken by some 5 million speakers in the Barak Val¬ 
ley region of northeast Bangladesh and southeast Assam in India. Worldwide there may be 
as many as 10 million speakers. Sylheti has commonly been regarded as a dialect of Ben¬ 
gali, with which it shares a high proportion of vocabulary. 

The Syloti Nagri script has 27 consonant letters with an inherent vowel of /o/ and 5 inde¬ 
pendent vowel letters. There are 5 dependent vowel signs that are attached to a consonant 
letter. Unlike Devanagari, there are no vowel signs that appear to the left of their associated 
consonant. 

Only two proper diacritics are encoded to support Syloti Nagri: anusvara and hasanta. 
Aside from its traditional Indie designation, anusvara can also be considered a final form 
for the sequence /-ng/, which does not have a base glyph in Syloti Nagri because it does not 
occur in other positions. Anusvara can also occur with the vowels U+A824 t syloti nagri 
vowel sign i and U+A826 c syloti nagri vowel sign e, creating a potential problem 
with the display of both items. It is recommended that anusvara always occur in sequence 
after any vowel signs, as a final character. 

Virama and Conjuncts. Syloti Nagri is atypical of Indie scripts in use of the virama ( has¬ 
anta ) and conjuncts. Conjuncts are not strictly correlated with the phonology being repre¬ 
sented. They are neither necessary in contexts involving a dead consonant, nor are they 
limited to such contexts. Hasanta was only recently introduced into the script and is used 
only in limited contexts. Conjuncts are not limited to sequences involving dead consonants 
but can be formed from pairs of characters of almost any type (consonant, independent 
vowel, dependent vowel) and can represent a wide variety of syllables. It is generally unnec¬ 
essary to overtly indicate dead consonants with a conjunct or explicit hasanta. The only 
restriction is that an overtly rendered hasanta cannot occur in connection with the first ele¬ 
ment of a conjunct. The absence of hasanta does not imply a live consonant and has no 
bearing on the occurrence of conjuncts. Similarly, the absence of a conjunct does not imply 
a live consonant and has no bearing on the occurrence of hasanta. 

Digits. There are no unique Syloti Nagri digits. When digits do appear in Syloti Nagri texts, 
they are generally Bengali forms. Any font designed to support Syloti Nagri should include 
the Bengali digits because there is no guarantee that they would otherwise exist in a user’s 
computing environment. They should use the corresponding Bengali block code points, 
U+09E6..U+09EF. 

Punctuation. With the advent of digital type and the modernization of the Syloti Nagri 
script, one can expect to find all of the traditional punctuation marks borrowed from the 
Latin typography: period, comma, colon, semicolon, question mark, and so on. In addition, 
the Devanagari single danda and double danda are used with great frequency. 
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Poetry Marks. Four native poetry marks are included in the Syloti Nagri block. The script 
also makes use of U+2055 * flower punctuation mark (in the General Punctuation 
block) as a poetry mark. 
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15.2 Kaithi 

Kaithi: U+11080-U+110CF 

Kaithi, properly transliterated Kaithi, is a North Indian script, related to the Devanagari 
and Gujarati scripts. It was used in the area of the present-day states of Bihar and Uttar 
Pradesh in northern India. 

Kaithi was employed for administrative purposes, commercial transactions, correspon¬ 
dence, and personal records, as well as to write religious and literary materials. As a means 
of administrative communication, the script was in use at least from the 16th century until 
the early 20th century, when it was eventually eclipsed by Devanagari. Kaithi was used to 
write Bhojpuri, Magahi, Awadhi, Maithili, Urdu, and other languages related to Hindi. 

Standards. There is no preexisting character encoding standard for the Kaithi script. The 
repertoire encoded in this block is based on the standard form of Kaithi developed by the 
British government of Bihar and the British provinces of northwest India in the 19th cen¬ 
tury. A few additional Kaithi characters found in manuscripts, printed books, alphabet 
charts, and other inventories of the script are also included. 

Styles. There are three presentation styles of the Kaithi script, each generally associated 
with a different language: Bhojpuri, Magahi, or Maithili. The Magahi style was adopted for 
official purposes in the state of Bihar, and is the basis for the representative glyphs in the 
code charts. 

Rendering Behavior. Kaithi is a Brahmi-derived script closely related to Devanagari. In 
general, the rules for Devanagari rendering apply to Kaithi as well. For more information, 
see Section 12.1, Devanagari. 

Vowel Letters. An independent Kaithi letter for vocalic r is represented by the consonant- 
vowel combination: U+110A9 kaithi letter ra and U+110B2 kaithi vowel sign ii. 

In print, the distinction between short and long forms of i and u is maintained. However, 
in handwritten text, there is a tendency to use the long vowels for both lengths. 

Consonant Conjuncts. Consonant clusters were handled in various ways in Kaithi. Some 
spoken languages that used the Kaithi script simplified clusters by inserting a vowel 
between the consonants, or through metathesis. When no such simplification occurred, 
conjuncts were represented in different ways: by ligatures, as the combination of the half¬ 
form of the first consonant and the following consonant, with an explicit virama (U+110B9 
kaithi sign virama) between two consonants, or as two consonants without a virama. 

Consonant conjuncts in Kaithi are represented with a virama between the two consonants 
in the conjunct. For example, the ordinary representation of the conjunct mba would be by 
the sequence: 

U+110A7 kaithi letter ma + U+110B9 kaithi sign virama + 
U+110A5 kaithi letter ba 
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Consonant conjuncts may be rendered in distinct ways. Where there is a need to render 
conjuncts in the exact form as they appear in a particular source document, U+200C zero 
width non-joiner and U+200D zero width joiner can be used to request the appropri¬ 
ate presentation by the rendering system. For example, to display the explicitly ligated 
glyph rq for the conjunct mba, U+200D zero width joiner is inserted after the virama: 

U+110A7 kaithi letter ma + U+110B9 kaithi sign virama + 

U+200D zero width joiner + U+110A5 kaithi letter ba 

To block use of a ligated glyph for the conjunct, and instead to display the conjunct with an 
explicit virama, U+200C zero width non-joiner is inserted after the virama: 

U+110A7 kaithi letter ma + U+110B9 kaithi sign virama + 

U+200C ZERO WIDTH NON-JOINER + U+110A5 KAITHI letter BA 

Conjuncts composed of a nasal and a consonant may be written either as a ligature with 
the half-form of the appropriate class nasal letter, or the full form of the nasal letter with an 
explicit virama (U+110B9 kaithi sign virama) and consonant. In Grierson’s Linguistic 
Survey of India, however, U+110A2 kaithi letter na is used for all articulation classes, 
both in ligatures and when the full form of the nasal appears with the virama. 

Ruled Lines. Kaithi, unlike Devanagari, does not employ a headstroke. While several man¬ 
uscripts and books show a headstroke similar to that of Devanagari, the line is actually a 
ruled line used for emphasis, titling or sectioning, and is not broken between individual let¬ 
ters. Some Kaithi fonts, however, were designed with a headstroke, but the line is not bro¬ 
ken between individual letters, as would occur in Devanagari. 

Nukta. Kaithi includes a nukta sign, U+110BA kaithi sign nukta, a dot which is used as 
a diacritic below various consonants to form new letters. For example, the nukta is used to 
distinguish the sound va from ba. The precomposed character U+110AB kaithi letter 
va is separately encoded, and has a canonical decomposition into the sequence of 
U+110A5 kaithi letter ba plus U+110BA kaithi sign nukta. Precomposed characters 
are also encoded for two other Kaithi letters, rha and dddha. 

The glyph for U+110A8 kaithi letter ya may appear with or without a nukta. Because 
the form without the nukta is considered a glyph variant, it is not separately encoded as a 
character. The representative glyph used in the chart contains the dot. The nukta diacritic 
also marks letters representing some sounds in Urdu or sounds not native to Hindi. No 
precomposed characters are encoded in those cases, and such letters must be represented 
by a base character followed by the nukta. 

Punctuation. A number of Kaithi-specific punctuation marks are encoded. Two marks 
designate the ends of text sections: U+110BE kaithi section mark, which generally indi¬ 
cates the end of a sentence, and U+110BF kaithi double section mark, which delimits 
larger blocks of text, such as paragraphs. Both section marks are generally drawn so that 
their glyphs extend to the edge of the text margins, particularly in manuscripts. 

The character U+110BD kaithi number sign is a format control that interacts with digits. 
It occurs below a digit or sequence of digits, indicating a numerical reference. The related 
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character U+110CD kaithi number sign above occurs above a digit or sequence of dig¬ 
its, and indicates a number in an itemized list, similar to U+2116 numero sign. Like 
U+0600 Arabic number sign and the other Arabic signs that span numbers (see 
Section 9.2, Arabic ), these Kaithi format controls precede the numbers they graphically 
interact with, rather than following them. U+110BC kaithi enumeration sign is a stand¬ 
alone, spacing symbol for inline usage. 

U+110BB kaithi abbreviation sign, shaped like a small circle, is used in Kaithi to indi¬ 
cate abbreviations. This mark is placed at the point of elision or after a ligature to indicate 
common words or phrases that are abbreviated, in a similar way to U+0970 devanagari 
abbreviation sign. 

Kaithi makes use of two script-specific dandas: U+110C0 kaithi danda and U+110C1 

KAITHI DOUBLE DANDA. 

For other punctuation marks occurring in Kaithi texts, available Unicode characters may 
be used. A cross-shaped character, used to mark phrase boundaries, can be represented by 
U+002B plus sign. For hyphenation, users should follow whatever is the recommended 
practice found in similar Indie script traditions, which might be U+2010 hyphen or 
U+002D hyphen-minus. For dot-like marks that appear as word-separators, U+2E31 
word separator middle dot, or, if the word boundary is more like a dash, U+2010 
hyphen can be used. 

Digits. The digits in Kaithi are considered to be stylistic variants of those used in Devana¬ 
gari. Hence the Devanagari digits located at U+0966..U+096F should be employed. To 
indicate fractions and unit marks, Kaithi uses characters encoded in the Common Indie 
Number Forms block, U+A830..U+A839. 
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15.3 Sharada 

Sharada: U+11180-U+111DF 

Sharada is a historical script that was used to write Sanskrit, Kashmiri, and other languages 
of northern South Asia. It served as the principal inscriptional and literary script of Kash¬ 
mir from the 8th century ce until the 20th century. In the 19th century, expanded use of the 
Arabic script to write Kashmiri and the growth of Devanagari contributed to the marginal¬ 
ization of Sharada. Today the script is employed in a limited capacity by Kashmiri pandits 
for horoscopes and ritual purposes. 

Rendering Behavior. Sharada is a Brahmi-based script, closely related to Devanagari. In 
general, the rules for Devanagari rendering apply to Sharada as well. For more informa¬ 
tion, see Section 12.1, Devanagari. 

Ruled Lines. While the headstroke is an important structural feature of a character’s glyph 
in Sharada, there is no rule governing the joining of headstrokes of characters to other 
characters. The variation was probably due to scribal preference, and should be handled at 
the font level. 

Virama. The U+111C0 cl sharada sign virama is a spacing mark, written to the right of 
the consonant letter it modifies. Semantically, it is identical to the Devanagari virama and 
other similar Indie scripts. 

Candrabindu and Avagraha. U+11180 sharada sign candrabindu indicates nasal¬ 
ization of a vowel. It may appear in manuscripts in an inverted form but with no semantic 
difference. Such glyph variants should be handled in the font. U+111C1 s sharada sign 
avagraha represents the elision of a word-initial a. Unlike the usual practice in Devana¬ 
gari in which the avagraha is written at the normal letter height and attaches to the top 
stroke of the following character, the avagraha in Sharada is written at or below the base¬ 
line and does not connect to the neighboring letter. 

Jihvamuliya and Upadhmaniya. The velar and labial allophones of /h/, followed by voice¬ 
less velar and labial stops respectively, are written in Sharada with separate signs, U+111C2 

L , J SHARADA SIGN JIHVAMULIYA and U+111C3 |“j SHARADA SIGN UPADHMANIYA. These tWO 

signs have the properties of a letter and appear only in stacked conjuncts without the use of 
virama. Jihvamuliya is used to represent the velar fricative [x] in the context of a following 
voiceless velar stop: 

U+111C2 [*} jihvamuliya + U+11191 ? ka — > 

U+111C2 [*} jihvamuliya + U+11192 m kha — > 3 

Upadhmaniya is used to represent the bilabial fricative [<J>] in the context of a following 
voiceless labial stop: 

U+111C3 i“j upadhmaniya + U+111A5 M pa — > M 
U+111C3 j“j upadhmaniya + U+111A6 to pha —> TZ 
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Punctuation. U+111C7 • sharada abbreviation sign appears after letters or combina¬ 
tions of letters. It marks the sequence as an abbreviation. A word separator, U+111C8 , 
sharada separator, indicates word and other boundaries. Sharada also makes use of two 
script-specific dandas: U+111C5 I sharada danda and U+111C6 II sharada double 

DANDA. 

Digits. Sharada has a distinctive set of digits encoded in the range U+111D0..U+111D9. 
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15.4 Takri 

Takri: U+11680-U+116CF 

Takri is a script used in northern India and surrounding countries in South Asia, including 
the areas that comprise present-day Jammu and Kashmir, Himachal Pradesh, Punjab, and 
Uttarakhand. It is the traditional writing system for the Chambeali and Dogri languages, as 
well as several “Pahari” languages, such as Jaunsari, Kulvi, and Mandeali. It is related to the 
Gurmukhi, Landa, and Sharada scripts. Like other Brahmi-derived scripts, Takri is an 
abugida, with consonants taking an inherent vowel unless accompanied by a vowel marker 
or the virama (vowel killer). 

Takri is descended from Sharada through an intermediate form known as Devasesa, which 
emerged in the 14th century. Devasesa was a script used for religious and official purposes, 
while its popular form, known as Takri, was used for commercial and informal purposes. 
Takri became differentiated from Devasesa during the 16th century. In its various regional 
manifestations, Takri served as the official script of several princely states of northern and 
northwestern India from the 17th century until the middle of the 20th century. Until the 
late 19th century, Takri was used concurrently with Devanagari, but it was gradually 
replaced by the latter. 

Owing to its use as both an official and a popular script, Takri appears in numerous 
records, from manuscripts to inscriptions to postage stamps. There are efforts to revive the 
use of Takri for languages such as Dogri, Kishtwari, and Kulvi as a means of preserving 
access to these language’s literatures. 

There is no universal, standard form of Takri. Where Takri was standardized, the reformed 
script was limited to a particular polity, such as a kingdom or a princely state. The repre¬ 
sentative glyphs shown in the code charts are taken mainly from the forms used in a variant 
established as the official script for writing the Chambeali language in the former Chamba 
State, now in Himachal Pradesh, India. There are a number of other regional varieties of 
Takri that have varying letterforms, sometimes quite different from the representative 
forms shown in the code charts. Such regional forms are considered glyphic variants and 
should be handled at the font level. 

Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be ana¬ 
lyzed visually as consisting of multiple parts. Table 15-1 shows the letters that can be ana¬ 
lyzed, the single code point that should be used to represent them in text, and the sequence 
of code points resulting from analysis that should not be used. 

Consonant Conjuncts. Conjuncts in Takri are infrequent and, when written, consist of two 
consonants, the second of which is always ya, ra, or ha. Takri ya is written as a subjoining 
form; Takri ra can be written as a ligature or a subjoining form; and Takri ha is written as 
a half-form. 
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Table 15-1. Takri Vowel Letters 


For 

Use 

Do Not Use 

Ts, 

11681 

<11680,116AD> 

Z 

11687 

<11686,116B2> 

Vi 

11688 

<11680,116B4> 

m 

11689 

<11680,116B5> 


Nukta. A combining nnkta character is encoded as U+116B7 takri sign nukta. Charac¬ 
ters that use this sound, mainly loan words and words from other languages, may be repre¬ 
sented using the base character plus nukta. 

Headlines. Unlike Devanagari, headlines are not generally used in Takri. However, head¬ 
lines do appear in the glyph shapes of certain Takri letters. The headline is an intrinsic fea¬ 
ture of glyph shapes in some regional varieties such as Dogra Akkhar, where it appears to 
be inspired by the design of Devanagari characters. There are no fixed rules for the joining 
of headlines. For example, the headlines of two sequential characters possessing headlines 
are left unjoined in Chambeali, while the headlines of a letter and a vowel sign are joined in 
printed Dogra Akkhar. 

Punctuation. Takri uses U+0964 devanagari danda and U+0965 devanagari double 
danda from Devanagari. 

Fractions. Fraction signs and currency marks found in Takri documents use the characters 
in the Common Indie Number Forms block (U+A830..U+A83F). 
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15.5 Siddham 

Siddham: U+11580-U+115FF 

Siddham is a Brahmi-based writing system that originated in India, and is presently used 
primarily in East Asia. The script is also known as Siddhamatrka and Kutila. The name Sid¬ 
dham atrika has broad historic and regional usage throughout India and East Asia. How¬ 
ever, modern usage is most strongly associated with the Shingon and Tendai Buddhist 
traditions in Japan, where the script is also known as Bonji. The representative glyphs in 
the code charts are based upon Japanese forms of Siddham characters. 

The historical record shows the use of Siddham in Central Asia, but the predominant 
examples are of its use for writing Sanskrit in China, Japan, and Korea, notably for Bud¬ 
dhist manuscripts. Today, it is mainly used for ceremonial and ritualistic purposes associ¬ 
ated with esoteric Buddhist practices. 

Siddham is most closely related to Sharada, another Brahmi-based script that originated in 
Kashmir. 

Nukta. The sign U+115C0 o siddham sign nukta is used for transcribing sounds that are 
not native to the writing system. The nukta sign is not a traditional Siddham character, but 
it is part of modern Siddham, so that it can accommodate the writing of Japanese and 
English. 

Vowels. The Siddham vowel signs for u and uu may appear in two forms. The regular 
forms, called “cloud” forms, are represented by U+115B2 siddham vowel sign u and 
U+115B3 siddham vowel sign uu. Alternate vowel sign forms, referred to as “warbler” 
forms, are represented instead by U+115DC siddham vowel sign alternate u and 
U+115DD siddham vowel sign alternate uu. 

The combination of ra and u should be written with the sequence <U+115A8 £ siddham 
letter ra, U+115DC o siddham vowel sign alternate u> and rendered as J. For the 
combination ra and uu, the form ^ should be employed, represented by the sequence 
<U+115A8 SIDDHAM LETTER RA, U+115DD SIDDHAM VOWEL SIGN ALTERNATE UU>. 

Virama and Conjuncts. The virama, U+115BF o siddham sign virama, is identical to the 
corresponding character in Devanagari and silences the inherent vowel of a consonant. 
The default rendering of the Siddham virama is as a visible sign. 

Consonant clusters in Siddham are written as conjuncts and follow the same model as con¬ 
juncts in Devanagari. Conjuncts are represented using the Siddham virama, which is writ¬ 
ten between each consonant in the cluster. Conjuncts may be written vertically, 
horizontally, or as independent ligatures. There are traditional Chinese and Japanese tabu¬ 
lations for Siddham conjuncts. 

Siddham conjuncts may represent clusters with a large number of consonants. For exam¬ 
ple, rksvrya is a conjunct cluster produced by a sequence of six conjuncts, as shown in 
Figure 15-1. 
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Figure 15-1. Siddham Consonant Cluster 
r k s v r ya 

msmsm • 



Head Marks. The mark U+115C1 siddham sign siddham is written at the beginning 
of a text. Paleographically, the sign corresponds to characters used in other scripts, such as 
U+0FD3 tibetan mark initial brda rnying yig mgo mdun ma. It represents the San¬ 
skrit word siddham, “accomplished,” and the phrase siddhirastu, “may there be success.” A 
vertically-oriented glyph variant is used for vertical text layout. 

Repetition Marks. Three marks, U+115C6 a. siddham repetition mark-i, U+115C7 

SIDDHAM REPETITION MARK-2, and U+115C8 E SIDDHAM REPETITION MARK-3 are USed to 

indicate the text repetition. They are written after the text that is to be repeated. 

Section Signs. A set of fourteen section marks are used in Siddham to indicate the ends of 
sentences, phrases, verses, and sections. They appear in manuscripts and script manuals. 
According to the Shingon philosophy, the characters possess esoteric qualities that relay 
information regarding the interpretation of the text. 

Punctuation. There are five other punctuation marks encoded for Siddham, as shown in 
Table 15-2. Both Siddham danda and Siddham double danda have graphical variants used 
in informal Japanese writing of Siddham. 


Table 15-2. Siddham Punctuation Characters 


Code Point and Name 

Purpose 

115C2 

A 

SIDDHAM DANDA 

marks the end of sentences and other short text sections 

115C3 

h 

SIDDHAM DOUBLE DANDA 

used at the end of paragraphs and larger text blocks 

115C4 

• 

SIDDHAM SEPARATOR DOT 

marks boundaries between syllables, words, and phrases; 
written at the head-height. 

115C5 

1 

SIDDHAM SEPARATOR BAR 

marks boundaries between syllables, words, and phrases 

115C9 

:ll: 

SIDDHAM END OF TEXT MARK 

indicates the end or completion of a text 
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Mahajani: U+11150-U+1117F 

Mahajani is a Brahmi-based writing system that was commonly used across northern India 
until the middle of the 20th century. It is a specialized commercial script used for writing 
accounts and financial records. It was used for recording several languages: Hindi, Mar- 
wari, and Punjabi. Mahajani was taught and used as a medium of education in Punjab, 
Rajasthan, Uttar Pradesh, Bihar, and Madhya Pradesh in schools where students from 
merchant and trading communities learned the script and other writing skills required for 
business. The name “Mahajani” refers to bankers and money lenders, who were the pri¬ 
mary users of the script. The majority of Mahajani records are account books. Although the 
Mahajani script is no longer in general use, it is an important key to the historical financial 
records of northern India. 

Mahajani has similarities to Landa, Kaithi, and Devanagari. In structure and orthography, 
Mahajani resembles scripts of the Landa family used in Punjab and Sindh, which are 
related to Sharada. 

Structure. Mahajani is written from left to right. It is based upon the Brahmi model, but it 
is structurally simpler and behaves as an alphabet. Vowel signs are not used, and there is no 
virama. Consonant clusters are not written in Mahajani using half-forms or ligatures 
(except for one ligature for shri ), or even a visible virama. The elements of a consonant 
cluster are written sequentially using regular consonant letters. 

Vowel signs are not written. Consonant letters theoretically bear the inherent vowel /a/, 
but the glyph for ka for example represents not only ka, but also any one of the syllables ka, 
kd, ki, ki, ke, and so on. In cases where greater precision is required, a vowel letter may be 
written after a consonant to convey the intended vocalic context. In general, the value of a 
consonant letter must be inferred at the morphological level. 

Nasalization is not represented using special signs, such as anusvara. Instead U+11167 
mahajani letter na is used in cases where nasalization is explicitly recorded. In several 
cases, words are written simply with nasalization deleted. 

U+11173 mahajani sign nukta is used for writing sounds that are not represented by a 
unique character, such as allophonic variants and sounds that occur in local dialects or in 
loanwords. It has limited use in Mahajani. 

Several letters have glyphic variants. Those variants are not separately encoded. 

Digits. Mahajani does not have distinctive script-specific digits. The Devanagari digits 
located at U+0966..U+096F should be used. 

Other Symbols. Fraction signs and unit marks are found in Mahajani documents, and may 
be represented using the characters encoded in the “Common Indie Number Forms” 
block. 
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Punctuation. Mahajani employs a dash, middle dot, and colon, which should be repre¬ 
sented by the corresponding Latin characters. For the dandas, Mahajani employs U+0964 
devanagari danda and U+0965 devanagari double danda. Mahajani also contains two 
other script-specific punctuation signs, U+11174 abbreviation sign and U+11175 sec¬ 
tion mark. There are no formal rules for punctuation and word spacing is not generally 
observed. 
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15.7 Khojki 

Khojki: U+11200-U+1124F 

Khojki is a writing system used by the Nizari Ismaili community of South Asia for record¬ 
ing religious literature. It was developed in Sindh, now in Pakistan, for representing the 
Sindhi language. The script spread to surrounding regions and was used for writing Guja¬ 
rati, Punjabi, and Siraiki, as well as several languages related to Hindi. It was also used for 
writing Arabic and Persian. Popular Nizari Ismaili tradition states that Khojki was 
invented and propagated by Pir Sadruddin, an Ismaili missionary. 

Khojki is one of two Landa scripts that were developed into formal liturgical scripts for use 
by religious communities; the other is Gurmukhi, which was developed for writing the 
sacred literature of the Sikh tradition. 

Khojki is also called “Sindhi” and “Khwajah Sindhi.” Khojki was in use by the 16th century 
ce, as attested by manuscript evidence. The printing of Khojki books flourished after Lal- 
jibhai Devraj produced metal types for Khojki in Germany for use at his Khoja Sindhi 
Printing Press in Mumbai. 

While usage of Khojki has declined over the past century, it is used wherever Nizari Ismaili 
Muslims of South Asian origin reside. The largest communities are found in Pakistan, 
India, Canada, United States, the United Kingdom, Kenya, Tanzania, and Uganda. Khojki 
primers continue to be published in Pakistan for teaching the script. Khojki manuscripts 
and books are used in Ismaili ceremonies not only in South Asia, but in east and south 
Africa, where large diaspora communities formed by the 19th century. The script was also 
used by communities related to the Nizari Ismailis, such as the Imamshahis of Gujarat. 

Structure. The general structure of Khojki is similar to that of other Brahmi-derived Indie 
scripts. It is written from left-to-right. 

Khojki has a smaller repertoire of independent vowel letters than other Brahmi-derived 
scripts. The letters U+11202 khojki letter i and U+11203 khojki letter u are used for 
writing both short and long forms of i and u, respectively. The letters U+11205 khojki 
letter ai and U+11207 khojki letter au represent diphthongs. Although they are 
attested in manuscripts and books, Khojki originally did not have unique letters for these 
vowels. In early Khojki records, diphthongs are generally represented as digraphs. Several 
variant forms of vowel letters are also attested. 

The repertoire of dependent vowel signs is larger than that of independent vowel letters. 
There are separate signs for U+1122D khojki vowel sign i and U+1122E khojki vowel 
sign ii, but no form for uu. Instead, the single sign U+1122F khojki vowel sign u is used 
for both short and long forms. U+11232 khojki vowel sign o is often written by placing 
the U+11230 khojki vowel sign e element above the consonant letter. 

Geminate consonants are marked by the U+11237 khojki sign shadda, written above the 
consonant letter that is doubled. The positioning may change in relation to vowel signs. 
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Nasalization is indicated by the sign U+11234 khojki sign anusvara. It is written to the 
right of the letter or sign with which it combines. 

U+11235 khojki sign virama is identical in function to corresponding characters in 
other Indie scripts. It is written to the right of a consonant letter. 

U+11236 khojki sign nukta is used for producing characters to represent sounds not 
native to Sindhi. The sign may be written with vowel letters, vowel signs, and consonant 
letters. The nukta is written above a letter. 

Punctuation. Khojki separates words using U+1123A khojki word separator. U+11238 
khojki danda and U+11239 khojki double danda are used to mark the end of sen¬ 
tences. The double danda is also used to mark verse sections. Typically, double danda is 
written with U+1123A khojki word separator to the left and right of verse numbers. 

Section marks appear frequently in Khojki manuscripts as punctuation that delimits the 
end of a section or another larger block of text. The U+1123B khojki section mark is 
generally used to mark the end of a sentence, while U+1123C double section mark is 
used to delimit larger blocks of text, such as paragraphs. Both generally extend to the mar¬ 
gin of the text-block. 

Latin punctuation marks are also used in printed Khojki. 

U+1123D khojki abbreviation sign is used for marking abbreviations. 

Digits. Khojki makes use of Gujarati digits U+0AE6 through U+OAEF. 
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15.8 Khudawadi 

Khudawadi: U+112B0-U+112FF 

Khudawadi is a script used historically for writing the Sindhi language, which is spoken in 
India and Pakistan. Official forms of Khudawadi are known as “Hindi Sindhi,” “Hindu 
Sindhi,” and “Standard Sindhi.” Khudawadi is a Landa-based script and related to Sharada. 
Like other Landa writing systems, Khudawadi is a mercantile script used for routine writ¬ 
ing, accounting, and other commercial purposes and was known as the shopkeeper and 
merchant script. It is associated with the merchant communities of Hyderabad, Sindh. In 
addition to mercantile records, Khudawadi was used in education, book printing, and for 
court records. 

In the 1860s, Khudawadi was chosen as the basis for a written standard for education and 
administration in Sindh and was developed as an official language. Official Khudawadi 
possesses unique characters for each vowel and consonant sound of the Sindhi language, as 
well as vowel signs. In the late 19th century, an Arabic-based script became the official writ¬ 
ing system for Sindhi in Pakistan and India. Sindhi is also written in the Devanagari script 
in India. Khudawadi is now obsolete. 

Structure. The general structure of Khudawadi is similar to that of other Brahmi-based 
Indie scripts. It is written from left-to-right. 

Vowel Letters. Some independent vowel letters may be represented using a combination of 
a base vowel letter and a dependent vowel sign. This practice is not recommended. The 
atomic character for the independent vowel letter should always be used. 

Table 15-3. Khudawadi Vowel Letters 


For 

Use 

Do Not Use 

Wll 

112B1 

112B0 + 112E0 

vh 

112B6 

112B0 + 112E5 


112B7 

112B0 + 112E6 

vvl 

112B8 

112B0 + 112E7 


112B9 

112B0 + 112E8 


Consonant Conjuncts. Consonant clusters generally consist of two consonants. These are 
written using a visible virama. The encoded representation is <C1 + virama + C2>. Half¬ 
forms and ligated conjunct forms are not attested. 

Nasalization. U+112DF 6 khudawadi sign anusvara is used for indicating nasalization. 

Nukta. U+112E9 ? khudawadi sign nukta is used for representing sounds not native to 
Sindhi, such as those that may occur in Persian and Arabic loanwords. Attested Khudawadi 
letters with nukta are shown in Table 15-4, along with the Arabic letters for which they sub¬ 
stitute. ja + nukta, pronounced za, corresponds to a number of distinct Arabic letters. 
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Table 15-4. Representation of Arabic Sounds in Khudawadi 


Sound 

Khudawadi 

Arabic 

kha 

*3 KHA + NUKTA 

U+062E ARABIC LETTER KHAH 

g a 

13 GA + NUKTA 

U+063A ARABIC LETTER GHAIN 

za 

tf? JA + NUKTA 

U+0630 ARABIC LETTER THAL 
U+0632 ARABIC LETTER ZAIN 
U+0636 ARABIC LETTER DAD 
U+0638 ARABIC LETTER ZAH 

fa 

2 PHA + NUKTA 

U+0641 ARABIC LETTER FEH 


In principle, the nukta may be written with any Khudawadi vowel or consonant letter. If 
other combining marks, such as a dependent vowel sign or anusvara, also occur in a com¬ 
bining sequence applied to that base character, then the convention is to represent the 
nukta first in the combining sequence. 

Punctuation. The Khudawadi uses dandas and European punctuation, such as periods, 
dashes, colons, and semi-colons. Khudawadi dandas are unified with those of Devanagari. 
Line breaking for Khudawadi characters follows the rules for Devanagari. 

Digits. Khudawadi has a full set of decimal digits. Fraction signs and currency marks are 
attested in Khudawadi records. These may be represented using characters in the Common 
Indie Number Forms block found at U+A830..U+A83F. 
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Multani: U+11280-U+112AF 

The Multani script was used to write the Seraiki language, an Indo-Aryan language spoken 
in the Punjab in eastern Pakistan and the northern Sindh area of southeastern Pakistan. 
Multani is a Landa-based script, related to Gurmukhi, and distantly related to Khudawadi 
and Khojki. The script, also known as Karikki or Sarai, was used for routine writing and 
commercial activities. The first book in the Multani script was published in 1819. By the 
latter half of the 19th century, the British administration introduced the Arabic script as the 
standard for writing the languages of the Sindh, which led to the demise of various non- 
Arabic scripts, including Multani. The script continued to be used into the 20th century. 
Today Seraiki is written in the Arabic script. 

There is no standard form of the Multani script. The representative glyphs shown in the 
code charts are based on printed forms from an 1819 version of the New Testament, with 
additional characters that are found only in handwritten documents. Such variant forms 
are considered glyphic variants and should be handled at the font level. 

The script underwent orthographic changes in the first quarter of the 20th century, with a 
reduction in the character repertoire. The repertoire encoded in this block is based on the 
set of all characters that are distinctly attested. 

Structure. Although Multani is based on the Brahmi model, it is closer in structure to an 
abjad than an abugida. There are four independent vowel letters, a, i, u and e, and no 
dependent vowel signs. Consonants theoretically possess the inherent /a/ vowel, but as 
vowels are not marked, the actual syllabic vowel of a consonant in running text is ambigu¬ 
ous and must be inferred from context. Consonant clusters are written using independent 
letters, rather than with conjuncts. There is no virama. Vowels are generally not written 
unless they occur in isolation, in word initial position, or in the final position of monosyl¬ 
labic words. 

The letter ~r< a is used to represent /a/, /a:/ and in some sources /e/ and /ae/. The letter G i 
represents /i/ and /i:/ and commonly the semivowel 1)1. The letter © u represents /u/, /u:/ 
and /o/. The letter o e represents /e/, and in some sources /ae/ and /o/. 

Digits. The Gurmukhi digits U+0A66..U+0A6F should be employed to represent digits in 
Multani. 

Punctuation. Multani has only one script-specific punctuation mark, U+112A9 multani 
section mark, which indicates the end of a sentence. 
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Tirhuta: U+11480-U+114DF 

Tirhuta is the traditional writing system for the Maithili language, which is spoken by more 
than 35 million people in the state of Bihar in India, and in the Narayani and Janakpur 
zones of Nepal. Maithili is an official regional language of India and the second most spo¬ 
ken language in Nepal. Tirhuta is a Brahmi-based script derived from Gaud!, or “Proto- 
Bengali,” which evolved from the Kutila branch of Brahmi by the 10th century. It is related 
to the Bengali, Newari, and Oriya scripts, which are also descended from Gaud!, and 
became differentiated from them by the 14th century. 

Tirhuta remained the primary writing system for Maithili until the late 20th century, when 
it was replaced by Devanagari. The Tirhuta script forms the basis of scholarly and religious 
scribal traditions that have been associated with the Maithili and Sanskrit languages since 
the 14th century. Tirhuta continues to be used for writing manuscripts of religious and lit¬ 
erary texts, as well as personal correspondence. Since the 1950s, various literary societies, 
such as the Maithili Akademi and Chetna Samiti, have been publishing literary, educa¬ 
tional, and linguistic materials in Tirhuta. The script is also used in signage in Darbhanga 
and other districts of north Bihar, and as an optional script for writing the civil services 
examination in Bihar. 

Although several Tirhuta characters, ligatures or combined shapes bear resemblance to 
those of Bengali, these similarities are superficial. 

Structure. The general structure (phonetic order, matra reordering, use of virama, and so 
on) of Tirhuta is similar to that of other Brahmi-based Indie scripts. The script is written 
from left-to-right. 

Vowels. Tirhuta uses independent vowel letters and corresponding combining vowel signs. 
The signs U+114BA tirhuta vowel sign short e and U+114BD tirhuta vowel sign 
short o do not have corresponding independent forms, because the sounds they represent 
do not occur in word initial position. 

Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as 
consisting of multiple parts. Table 15-5 shows the letters that can be analyzed, the single 
code point that should be used to represent them in text, and the sequence of code points 
resulting from analysis that should not be used. 

Consonants. Some of the 33 consonants look like Bengali consonants, but represent differ¬ 
ent sounds. For example, U+114A9 tirhuta letter ra has the same form as U+09AC 
Bengali letter ba, and U+114AB tirhuta letter va has the same shape as U+09B0 

BENGALI LETTER RA. 

Consonants combined with vowel signs, combined in conjuncts, or appearing at the end of 
a word commonly use context-dependent ligatures or glyph combinations. These shapes 
also contrast with usage in Bengali. For example, the consonant-vowel combination 
<U+1149E tirhuta letter ta, U+114B3 tirhuta vowel sign u> in Tirhuta produces 
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Table 15-5. Tirhuta Vowel Letters 


For 

Use 

Do Not Use 


11482 

<11481,114B0> 


11489 

<114AA, 114B5> 

f 

1148A 

<114AA, 114B6> 


1148C 

<1148B, 114BA> 

& 

1148E 

<1148D, 114BA> 


the same shape as the conjunct <U+09A4 Bengali letter ta, U+09CD Bengali sign 
virama, U+09A4 Bengali letter ta> in the Bengali script. 

All variant forms for letters, character elements and conjuncts in Tirhuta should be man¬ 
aged at the font level. 

Virama. U+114C2 tirhuta sign virama is identical in function to the corresponding 
character in other Indie scripts. 

Nasalization. Nasalization is indicated by U+114BF tirhuta sign candrabindu and 
U+114C0 tirhuta sign anusvara. These signs are written centered above the base. If 
written with an above-base sign or a letter with a graphical element that extends past the 
headstroke, they are placed to the right of such signs and elements. 

Characters for Representing Sanskrit. Two characters are attested in Vedic and classical 
Sanskrit manuscripts written in Tirhuta. U+114C1 tirhuta sign visarga represents an 
allophone of ra or sa at word-final position in Sanskrit orthography. U+114C5 tirhuta 
gvang represents nasalization. It belongs to the same class of characters as U+1CE9 vedic 

SIGN ANUSVARA ANTARGOMUKHA, U+1CEA VEDIC SIGN ANUSVARA BAHIRGOMUKHA, and 

so on. 

Tirhuta also uses U+1CF2 vedic sign ardhavisarga which can be found in the Vedic 
Extensions block. 

Nukta. U+114C3 tirhuta sign nukta is used for writing sounds that are not represented 
by a unique character, such as allophonic variants and sounds that occur in local dialects or 
in loanwords. The nukta may be written with any vowel or consonant letter. If other com¬ 
bining marks, such as a vowel sign or anusvara, also appear with the base character, then 
the nukta is written first. 

U+114A5 tirhuta letter ba and U+114AB tirhuta letter va have shapes that include 
a dot, but this is not semantically equivalent to a nukta. These letters do not decompose to 
nukta, and are treated as atomic characters. 

Punctuation. Tirhuta uses U+0964 devanagari danda and U+0965 devanagari double 
danda from the Devanagari block. 

Special Signs. U+114C6 tirhuta abbreviation sign denotes abbreviations. There are 
also two special script-specific signs in Tirhuta. The first, U+11480 tirhuta anji, is used 
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in the invocations of letters, manuscripts, books, and charts of the script. The sign anji is 
said to represent the tusk of the deity Ganesa, patron of learning. The second, U+114C7 
tirhuta om, contrasts with the Bengali sign for om, the latter being a simple combination 
of U+0993 BENGALI LETTER O plus U+0981 BENGALI SIGN CANDRABINDU. 

Digits. Tirhuta has a full set of decimal digits. 

Fractions. Number forms and unit marks are also found in Tirhuta documents. The most 
common of these are signs for writing fractions and currency, and they are represented 
using characters in the Common Indie Number Forms block (U+A830..U+A83F). They 
include U+A831 north indic fraction one half, U+A832 north indic fraction 
three quarters, and so on, as well as U+A838 north indic rupee mark. Tirhuta also 
uses Bengali “currency numerators,” such as U+09F4 Bengali currency numerator 
one. 
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Modi: U+l 1600- U+l 165F 

Modi is a Brahmi-based script used mainly for writing Marathi. Modi was also used to 
write other regional languages such as Hindi, Gujarati, Kannada, Konkani, Persian, Tamil, 
and Telugu. According to an old legend, the Modi script was brought to India from Sri 
Lanka by Hemadri Pandit, known also as Hemadpant, who was the chief minister of 
Ramacandra, the last king of the Yadava dynasty, who reigned from 1271 to about 1309. 
Another tradition credits the creation of the script to Balaji Avaji, secretary of state to the 
late 17th-century Maratha king Shivaji Raje Bhonsle, also known as Chhatrapati Shivaji 
Maharaj. While the veracity of such accounts is difficult to ascertain, it is clear that Modi 
derives from the Nagari family of scripts and is a modification of the Nagari model 
intended for continuous writing. 

Modi emerged as an administrative writing system in the 16th century before the rise of the 
Maratha dynasties. It was adopted by the Marathas as an official script beginning in the 
17th century and was used in such a capacity in Maharashtra until the middle of the 20th 
century. In the 1950s the use of Modi was formally discontinued and the Devanagari script, 
known as “Balbodh,” was promoted as the standard writing system for Marathi. 

There are thousands of Modi documents preserved in South Asia and Europe. The major¬ 
ity of these are in various archives in Maharashtra, while smaller collections are kept in 
Denmark and other countries, because of European presence in Tanjore, Pondicherry, and 
other regions in South Asia through the 19th century. The earliest extant Modi document 
dates from the early 17th century. While the majority of Modi documents are official let¬ 
ters, land records, and other administrative documents, the script was also used in educa¬ 
tion, journalism, and other routine activities before the 1950s. Printing in Modi began in 
the early 19th century after Charles Wilkins cut the first metal fonts for the script in Cal¬ 
cutta. Newspapers were published in Modi; primers were produced to teach the script in 
schools, and various personal papers and diaries were kept in the script. 

Structure. Modi is a Brahmi-based script related to Devanagari. It is written from left-to- 
right. In general, the rules for Devanagari rendering also apply to Modi (see Section 12.1, 
Devanagari). However, one characteristic feature of Modi is a large number of context- 
dependent forms of consonants and vowel-signs. Shaping and glyph substitutions for these 
contextual forms are managed in the font. 

Vowel Letters. Generally, the distinction between regular and long forms of i and u is not 
preserved in Modi. The letter U+11603 modi letter ii may represent both i and f, and 
U+11604 modi letter u may be used for writing both u and u. The same can be said of the 
corresponding dependent vowel signs. Both regular and long forms appear in the Modi 
block, because they are attested in documentation about Modi. 

The vocalic letters in the range U+11635..U+11638 are included in the encoding, but are 
not in modern use, as is the case in other Indie scripts. Modi vocalic r may alternatively be 
written as the sequence <U+11628 modi letter ra, U+11632 modi vowel sign ii> n . 
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Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as 
consisting of multiple parts. Table 15-6 shows the letters that can be analyzed, the single 
code point that should be used to represent them in text, and the sequence of code points 
resulting from analysis that should not be used. 

Table 15-6. Modi Vowel Letters 


For 

Use 

Do Not Use 

it 

1160A 

<11600,11639> 


1160B 

<11600,1163A> 

ft 

1160C 

<11601,11639> 


1160D 

<11601,1163A> 


Rendering. Many of the consonant-vowel and consonant-consonant combinations in 
Modi involve special contextual forms of the consonant or vowel-sign or both. These are 
rendered by means of contextual rules in the font, using specially shaped and positioned 
glyph pieces or preformed ligatures. 

Consonant Clusters Involving ra. A number of contextual forms are used for U+11628 
modi letter ra. Some of these are similar to the use of ra in Devanagari. As the first con¬ 
sonant in a cluster it is generally rendered as a repha-, however, Modi also uses the eyelash 
ra in place of repha in certain native Marathi contexts. As in Devanagari, the eyelash ra is 
produced using the sequence <U+11628 u modi letter ra, U+1163F o modi sign 
virama, U+200D h?J zero width joiner>. 

Non-initial ra in conjuncts is typically rendered using one of two subjoined forms; how¬ 
ever, some conjuncts with ra are represented as distinct ligatures. The most common of 
these is the conjunct represented by the sequence <U+1161D tt modi letter ta, 
U+1163F o modi sign virama, U+11628 u modi letter ra>. Sequences of ra following 
some other consonants, such as <ka, ra>, <ka, -aa, ra>, or <sa, ra> are also displayed by 
distinct ligatures, as shown in Figure 15-2. The sequence of initial ra followed by the 
rounded consonants kha, dha, or ha, may also appear with distinct ligatures. 

Figure 15-2. Modi Shaping for ra 


VL 

+ 

LF 



-» 


1160E 


11628 




ka-ra ligature 

VL 

+ 

r 

+ 

LF 

-» 


1160E 


11630 


11628 


ka-ra ligature 

U 

+ 

LF 



-» 


1162D 


11628 




sa-ra ligature 

U 

+ 


+ 

LF 

-» 

bit 

1162E 


11639 


11628 


he-ra ligature 
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Unusually, the shape of ra is also influenced at the word level, depending upon the charac¬ 
ters in the preceding syllable. See the last example in Figure 15-2. This influence on the 
shape of ra may even occur preceding punctuation; in certain environments, ra following a 
danda or double danda is written using a special contextual form. For example: 

U+11642 u double danda + U+11628 u ra —» 

To produce this behavior, the danda and double danda characters in the Modi block 
should be used instead of the ones in the Devanagari block. 

Punctuation and Word Boundaries. Traditionally, word boundaries are not marked in 
Modi because it is an administrative script, characterized by the practice of rapid writing 
without lifting the pen. Paragraph and other section boundaries are, however, indicated in 
some Modi documents through the use of whitespace. Modern practice uses spaces and 
various punctuation conventions, including danda and Western punctuation marks. Some 
printed books use a period instead of a danda to indicate a sentence boundary. 

Various Signs. Nasalization is indicated by U+1163D modi sign anusvara, and abbrevia¬ 
tions are indicated using U+11643 modi abbreviation sign. U+1163E modi sign vis- 
arga represents an allophone of ra or sa at word-final position in Sanskrit orthography. 
U+11640 modi sign ardhacandra is used for transcribing sounds used in English names 
and loanwords. 

U+11644 modi sign huva is written as an invocation in several Modi documents. It is 
derived from the Arabic huwa. 

Currency values are written using U+A838 north indic rupee mark. 

Numbers. Modi has a full set of decimal digits. Several number forms and unit marks are 
used for writing Modi and are represented using characters in the Common Indic Number 
Forms block. They include the base-16 fraction signs U+A830..U+A835. The absence of 
intermediate units is indicated by U+A837 north indic placeholder mark, which is 
called ali in Marathi. U+A836 north indic quarter mark is used for representing anna 
values. 
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Nandinagari: U+119A0-U+119FF 

Nandinagari is a Brahmi-based script that was used in southern India between the 11th 
and 19th centuries for manuscripts and inscriptions in Sanskrit in south Maharashtra, Kar¬ 
nataka and Andhra Pradesh. It is related to Devanagari, and was the official script of the 
Vijayanagara kingdom of southern India (1336-1646 ce). There are numerous manu¬ 
scripts and inscriptions containing Nandinagari text. This script was also used for writing 
Kannada in Karnataka. 

Structure. With minor historical exceptions, Nandinagari is an abugida written from left to 
right where there is a consonant plus an inherent vowel (usually the sound /a/), similar to 
Devanagari. The absence of the inherent vowel is frequently marked with a virama. The 
virama sign that suppresses the inherent vowel of the consonant is a combining character. 

Headstrokes. These are an inherent feature of Nandinagari letters, but their behavior dif¬ 
fers from headstrokes in modern Devanagari. Headstroke connections in Nandinagari 
generally are restricted to an aksara (orthographic syllable) and do not extend to neighbor¬ 
ing syllables. The headstroke connects vowel or consonant letters and spacing dependent 
vowels of an aksara, while spaces separate individual aksaras. 

Vowels. There are 12 vowel letters in the range U+119A0..U+119AD and 11 dependent 
vowel signs in the range U+119D1..U+119DD. U+119D2 nandinagari vowel sign i is 
positioned at the top-left edge of letters that have headstrokes. For other letters U+119D2 
hangs above the top-left portion of the body. However, the style of writing the sign varies 
considerably, particularly in handwriting. 

Consonants. There are 35 consonant letters. U+119D0 nandinagari letter rra appears 
to have been introduced in the 11th century for transcribing the Kannada letter RRA, and 
is not part of the traditional repertoire of Nandinagari. 

Virama. U+119E0 nandinagari sign virama has two functions, similar to the corre¬ 
sponding Devanagari character. Used as a halanta, it marks the absence of the inherent 
vowel of a consonant letter. U+119E0 is also a format character used to produce conjuncts. 

Vowel Modifiers. U+119DE nandinagari sign anusvara indicates nasalization. It is 
placed to the right of a base letter or right-side vowel sign. U+119DF nandinagari sign 
visarga represents post-vocalic aspiration in words of Sanskrit origin. 

Other Signs. U+119E1 nandinagari sign avagraha marks the elision of word-initial a in 
Sanskrit as a result of sandhi. The auspicious sign U+119E2 nandinagari sign siddham 
indicates an invocation at the beginning of documents. 

Punctuation. U+119E3 nandinagari headstroke is used as a sign of spacing or joining a 
word. It may connect a word that is broken on account of imperfections on a writing sur¬ 
face. U+119E3 can also serve as a gap filler. Nandinagari uses the danda and double danda 
marks encoded in the Devanagari block. 
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Digits. The Nandinagari digits are glyph variants of the Kannada digits U+0CE6..U+0CEF. 
No script specific digits are encoded for Nandinagari. 
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Grantha: U+11300-U+1137F 

The Grantha script descends from Brahmi. The modern form is chiefly used to write the 
Sanskrit language, including Vedic Sanskrit. It is used primarily in Tamil Nadu, and to a 
lesser extent in Sri Lanka and other parts of South India. 

The Grantha script is frequently mixed with the Tamil script to write Sanskrit words. 
Grantha has also been used to write the Sanskrit words of Tamil Manipravalam—a mixed 
Sanskrit-Tamil language—though this usage has become rare. In addition, Grantha char¬ 
acters may occasionally be employed with the Tamil script in the writing systems of 
minority languages of southern India. 

Historically, intermediate forms which gave rise to the Grantha script are attested as of the 
fourth century ce. The earliest examples are found in inscriptions of the early Pallava kings 
who ruled over parts of what is currently northern Tamil Nadu and southern Andhra 
Pradesh. Modern Grantha, which this encoding represents, belongs to the period after the 
thirteenth century ce. 

Modern Grantha is frequently used by Tamil speakers to represent Sanskrit because 
Grantha’s large set of letters can represent all the sounds of Sanskrit without the use of dia¬ 
critical marks. The Tamil script has a smaller repertoire of letters that requires diacritical 
marks to represent Sanskrit directly. This use of diacritical marks often leads to confusion 
regarding the pronunciation of Sanskrit when written in the Tamil script. 

Rendering Grantha 

Although the Grantha script is visually similar to Tamil, its structure is similar to other 
Indie scripts that are used to write Sanskrit. Written Sanskrit requires support for stacked 
consonant structures. 

Consonant Clusters. Some consonant clusters are stacks, some consonant structures are a 
combination of ligatures and stacks, and some are just ligatures. Ligatures are often used 
instead of stacks, and consonant clusters are frequently written as a combination of liga¬ 
tures and stacking. 

The typical stack height found in print in non-Vedic Sanskrit is two elements, but it is three 
in Vedic Sanskrit. Stacks, like ligatures, are equivalent to single consonants for the purpose 
of application of vowel signs. 

Instances requiring more than three elements in a stack require special handling. In these 
cases, the initial elements are pushed out of the consonant stack and may form their own 
stacks. Such special cases are illustrated in Figure 15-3. In this situation, a single phonolog¬ 
ical consonant cluster followed by a vowel may be represented by more than one 
orthographic cluster. 
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Figure 15-3. Splitting Large Conjunct Stacks in Grantha 

two elements —» two-level stack 

three elements —> three-level stack 

four elements —> vowelless element + three-level stack 

five elements —> vowelless two-level stack + three-level stack 

six elements —> vowelless three-level stack + three-level stack 

Virama. Grantha follows the same virama model as Telugu and Kannada, in which the 
sequence consonant + virama should be rendered as the vowelless form of the consonant in 
the desired orthographic style. For example, in the prevalent orthographic style used in 
modern printing, ta, na, and ma consistently fuse with the virama; ra and la superficially 
connect with it, and the virama stands apart for all other consonants, as shown in 
Table 15-7. 

Table 15-7. Rendering of Explicit Virama Forms in Grantha 

Fused 

ta + virama _a=, + 

na + virama + 

ma + virama g + 

Connected 

ra + virama nj + : f - > nf 

la + virama @) + :; : T —> CstT 

Unconnected 

ka + virama ^ + cF <9®^ 

tta + virama L- + O r ^ 



These visual distinctions in the rendering of explicit viramas also apply to the various 
ligated conjuncts of Grantha. 

Vowels. There are two forms of the au vowel sign: U+11357 grantha au length mark is 
the modern one-part form, while the two-part form U+1134C grantha vowel sign au, is 
somewhat archaic, but is found in manuscripts. 

Only two vowel signs touch their base consonant in printed Grantha: U+1133F grantha 
vowel sign i and U+11340 grantha vowel sign ii. U+11347 grantha vowel sign ee 
and U+11348 grantha vowel sign ai are rendered to the left of their base. U+1134B 
grantha vowel sign oo and the archaic U+1134C grantha vowel sign au are two-part 
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vowels with one part placed to the left of the base and one part to the right. All other vowel 
signs are placed to the right of the base. 

Manuscripts written in Grantha will show archaic ligatures of consonants with vowel signs. 
The vowel signs U+11362 grantha vowel sign vocalic l and U+11363 grantha 
vowel sign vocalic ll are sometimes placed below and sometimes placed to the right of 
the base consonant. In contemporary printing practice, vowel signs are placed to the right. 

Signs. Grantha uses the pluta sign to denote vowel lengthening. The pluta is not in current 
use, but it is found in Vedic manuscripts. The nukta is not used to write Sanskrit, but is 
used to transcribe words from other languages, such as Irula. 

Cantillation Marks. Grantha uses a number of cantillation marks to represent tone, stress, 
and breathing in Vedic texts. These marks include the twelve marks encoded in the 
Grantha block in the range from U+11366..U+11374, and many encoded in other blocks as 
well, including those listed in Table 15-8. 

Table 15-8. Additional Svara Marks used in Grantha 

Generic Vedic Accents 

0951 DEVANAGARI STRESS SIGN UDATTA 
0952 DEVANAGARI STRESS SIGN ANUDATTA 

Samavedic Marks 

1CD0 VEDIC TONE KARSHANA 
1CD2 VEDIC TONE PRENKHA 
1CD3 VEDIC SIGN NIHSHVASA 
20F0 COMBINING ASTERISK ABOVE 

Additional Marks 

1CF2 VEDIC SIGN ARDHAVISARGA 

1CF3 VEDIC SIGN ROTATED ARDHAVISARGA 

1CF4 VEDIC TONE CANDRA ABOVE 

1CF8 VEDIC TONE RING ABOVE 

1CF9 VEDIC TONE DOUBLE RING ABOVE 


These nonspacing marks are normally applied to independent vowels, to consonants with 
an inherent vowel, and to consonants with vowel signs. Sometimes they are also applied to 
dead consonants which are displayed with a visible virama. 

The preferred placement of svara marks in Grantha is horizontally centered relative to the 
syllable. These marks should not extend beyond the horizontal span of the base syllable. 
The svara marks can be applied to either syllables or digits, and used in combination with 
each other. 

Punctuation. Danda and double danda marks used with Grantha are found in the Devana- 
gari block; see Section 12.1, Devanagari. 

Numbers. Grantha makes use of the Tamil digits U+0BE6 through U+OBEF. 
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Ahom: U+11700-U+1173F 

The Ahom script is used in northeast India, primarily to write the Tai Ahom language. The 
oldest surviving Ahom text is the “Snake Pillar” inscription which was inscribed in the time 
of King Siuw Hum Miung (1497-1539). The script also appears on other stone inscriptions, 
coins, brass plates and a large corpus of manuscripts. Although the use of the Tai Ahom 
language declined in the late 17th century, traditional priests used the language and the 
Ahom script in their religious practices throughout the 19th century. 

Modern use of the Ahom script is considered to have begun in 1920 with the publication of 
an Ahom-Assamese-English dictionary. This was followed by publication of other dictio¬ 
naries, word lists, and primers. The publication of Ahom texts has progressed more rapidly 
in recent decades, thanks to the availability of computers. Today there are large numbers of 
books published in Assam that contain some Ahom content. 

Structure. Like most other Brahmi-derived scripts, Ahom is an abugida, for which conso¬ 
nant letters are associated with an inherent vowel “a”. The encoding also includes three 
medial consonants, in the range U+1171D..U+1171F, which follow and graphically attach 
to an initial consonant letter. In addition, Ahom has a visible virama that functions as a 
vowel killer, U+1172B ahom sign killer. The use of the killer is only obligatory in mod¬ 
ern Ahom. 

Vowels. Ahom has no independent vowels, but instead uses U+11712 ahom letter a fol¬ 
lowed by the corresponding dependent vowel sign (or signs). 

Syllabic Structure. Ahom has closed syllables, and optional medials may occur after initial 
consonants. Vowels can occur in sequences of U+11712 ahom letter a and dependent 
vowel signs, or a series of dependent vowel signs. Final consonants take U+1172B ahom 

SIGN KILLER. 

Numerals. The original Ahom numeral system was not a decimal radix system; however, in 
modern use a digit zero has been added, and the digits can be used to express decimal radix 
numerals. In traditional use, the digits may also be mixed with word spellings when writing 
out numbers. 

The forms of the Ahom digits are derived from several sources. U+11732 ahom digit two 
is visually identical to U+11701 ahom letter kha and probably derives from it. The digits 
3, 4, and 5 are usually expressed by the Ahom words for those numbers spelled out. 
U+1173B ahom number twenty is also just the Ahom word for 20 spelled out. 

Punctuation. Ahom uses two punctuation characters which function similarly to dandas-. 
U+1173C ahom sign small section and U+1173D ahom sign section. The script also 
uses a paragraph mark, U+1173E ahom sign rulai, and a symbol that indicates an excla¬ 
mation, U+1173F ahom symbol vi. 
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Modern Ahom uses spaces to indicate word boundaries. This convention is seen in some 
early Ahom manuscripts, but is not consistent in the early material. 

Variant Forms. A number of variant letterforms are found in manuscripts, but are no lon¬ 
ger used in modern Ahom. Specific characters are encoded to represent the historic vari¬ 
ants of ta, ga, ba, and the medial ligating ra. 
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15.15 Sora Sompeng 

Sora Sompeng: U+110D0-U+110FF 

The Sora Sompeng script is used to write the Sora language. Sora is a member of the 
Munda family of languages, which, together with the Mon-Khmer languages, makes up 
Austro-Asiatic. 

The Sora people live between the Oriya- and Telugu-speaking populations in what is now 
the Odisha-Andhra border area. 

Sora Sompeng was devised in 1936 by Mangei Gomango, who was inspired by the vision 
he had of the 24 letters. The script was promulgated as part of a comprehensive cultural 
program, and was offered as an improvement over IPA-based scripts used by linguists and 
missionaries, and the Telugu and Oriya scripts used by Hindus. Sora Sompeng is used in 
religious contexts, and is published in a variety of printed materials. 

Encoding Structure. The Sora Sompeng script is an alphabet. The consonant letters con¬ 
tain an inherent vowel. There are no conjunct characters for consonant clusters, and there 
is no visible vowel killer to show the deletion of the inherent vowel. The reader must deter¬ 
mine the presence or absence of the inherent schwa based on recognition of each word. 
The character repertoire does not match the phonemic repertoire of Sora very well. 

U+110E4 sora sompeng letter ih is used for both [i] and [i], and U+110E6 sora 
sompeng letter oh is used for both [o] and [d], for instance. The glottal stop is written 
with U+110DE sora sompeng letter hah, and the sequence of U+110DD sora 
sompeng letter rah and U+110D4 sora sompeng letter dah is used to write retroflex 
[t]. There is also an additional “auxiliary” U+110E8 sora sompeng letter mae used to 
transcribe foreign sounds. 

Character Names. Consonant letter names for Sora Sompeng are derived by adding [a?a] 
(written ah) to the consonant. 

Punctuation. Sora Sompeng uses Western-style punctuation. 

Line Breaking. Letters and digits behave as in Latin and other alphabetic scripts. 



South and Central Asia-IV 


627 


15.16 Dogra 


15.16 Dogra 

Dogra: U+11800-U+1184F 

In the 17th century, the Dogra script was used to write the Dogri language in Jammu and 
Kashmir in the northern region of the Indian subcontinent. Dogri is an Indo-Aryan lan¬ 
guage now usually written with the Devanagari script. The Dogra script was standardized 
in the 1860s, and is closely related to the Takri script. The official form, known as “Name 
Dogra Akkar” or “New Dogra Script,” appears in administrative documents, on currency, 
postcards, postage stamps, and in literary works. The unofficial, common written form of 
the script is called “Old Dogra.” The glyphs in the code chart are based on New Dogra. 

Structure. Dogra is an abugida, based on Brahmi. It is written left to right. The script 
includes a virama, U+11838 dogra sign virama, to create conjuncts and to suppress the 
inherent vowel. 

Vowels. Because the glyphs for Dogra vowel letters changed over time, the phonetic value 
of three vowel letters varies between New and Old Dogra. Old Dogra uses U+11802 dogra 
VOWEL LETTER I for U, U+11803 DOGRA VOWEL LETTER II for Z, and U+11804 DOGRA VOWEL 
letter u for o and au. The shapes of the vowel signs also vary between Old and New 
Dogra. Distinct fonts can be used to reflect the Old Dogra vowel shapes, as opposed to the 
New Dogra shapes. 

A feature of Dogra is that the dependent vowel may be represented either by the indepen¬ 
dent vowel letter, or by the dependent vowel sign. For example, the syllable ke may be rep¬ 
resented by f <ka, e> or ^ <ka, vowel sign e>. 

Characters Used to Represent Sanskrit. U+11831 dogra vowel sign vocalic r and 
U+11828 dogra letter ssa are used in New Dogra to represent sounds of Sanskrit origin. 

Consonant Conjuncts. Consonant clusters in Dogra may be rendered in different ways. 
The most common method is to place a virama beneath each bare consonant. For example, 
V)j pra is represented with the sequence <pa, virama, ra>. The second method is to use 
half-forms if the graphical structure permits it. For example, the conjunct shra, contained 
in the Sanskrit honorific shrii, appears regularly with the half-form 9T It is represented by 
the sequence <sha, virama, ra>. The third way to form consonant clusters is with ligatures. 
The ligature may be an atomic ligature, such as § ksha, which is represented with the 
sequence <ka, virama, ssa>, or with the combination of two consonants, in which the indi¬ 
vidual shapes of each letter is visible. For example, sta is written by the sequence <sa, 
virama, tta>. There is no evidence that special conjunct forms of ra occur. 

Other Symbols. U+11836 dogra sign anusvara indicates nasalization, and U+11837 
dogra sign visarga indicates post-vocalic aspiration in words of Sanskrit origin, while 
U+11839 dogra sign nukta is used to transcribe sounds that are not native to the Dogri 
language. 
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Punctuation. U+1183A dogra abbreviation sign denotes abbreviations. U+0964 deva- 
nagari danda and U+0965 devanagari double danda indicate the ends of sentences 
and paragraphs. 

Digits and Number Forms. Digits in Dogra vary across written and printed sources: some 
Old Dogra digits resemble Takri digits, while digits in some New Dogra documents resem¬ 
ble Devanagari. Because of this wide variation, script-specific digits have not been encoded. 
Devanagari digits should be used to represent digits in Dogra text. For representation of 
Dogra fraction and currency signs, use characters from the Common Indie Number Forms 
block. 



